Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue/pool reporting #8293

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open

Issue/pool reporting #8293

wants to merge 14 commits into from

Conversation

wouterdb
Copy link
Contributor

@wouterdb wouterdb commented Oct 30, 2024

Description

Update database statistics reporting

  1. towards the logs
  2. towards the status enpoint
  3. towards influxdb

closes #8228

TODO post merge:

  1. inform FE that we broke the API
  2. update grafana dashboard on grafana hub

Self Check:

Strike through any lines that are not applicable (~~line~~) then check the box

  • Attached issue to pull request
  • Changelog entry
  • Type annotations are present
  • Code is clear and sufficiently documented
  • No (preventable) type errors (check using make mypy or make mypy-diff)
  • Sufficient test cases (reproduces the bug/tests the requested feature)
  • Correct, in line with design
  • End user documentation is included or an issue is created for end-user documentation (add ref to issue here: )
  • If this PR fixes a race condition in the test suite, also push the fix to the relevant stable branche(s) (see test-fixes for more info)

@@ -101,7 +101,7 @@ src/inmanta/ast/type.py:0: error: No overload variant of "reversed" matches argu
src/inmanta/ast/type.py:0: note: Possible overload variants:
src/inmanta/ast/type.py:0: note: def [_T] __new__(cls, Reversible[_T], /) -> reversed[_T]
src/inmanta/ast/type.py:0: note: def [_T] __new__(cls, SupportsLenAndGetItem[_T], /) -> reversed[_T]
src/inmanta/server/services/databaseservice.py:0: error: Incompatible return value type (got "dict[str, object]", expected "dict[str, BaseModel | UUID | bool | int | float | datetime | str | Sequence[BaseModel | UUID | bool | int | float | datetime | str] | Mapping[str, BaseModel | UUID | bool | int | float | datetime | str]]") [return-value]
src/inmanta/server/services/metricservice.py:0: error: Incompatible types in assignment (expression has type "float", variable has type "int") [assignment]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre-existing issue, moved

@@ -1101,6 +1103,7 @@ src/inmanta/server/services/orchestrationservice.py:0: error: Incompatible types
src/inmanta/server/services/orchestrationservice.py:0: error: Incompatible types in assignment (expression has type "set[str | None]", variable has type "set[str]") [assignment]
src/inmanta/server/services/orchestrationservice.py:0: error: Argument "agents" to "_trigger_auto_deploy" of "OrchestrationService" has incompatible type "Collection[str]"; expected "Sequence[str] | None" [arg-type]
src/inmanta/server/services/orchestrationservice.py:0: error: Incompatible return value type (got "ReturnValueWithMeta[Sequence[DesiredStateVersion]]", expected "ReturnValue[list[DesiredStateVersion]]") [return-value]
src/inmanta/server/agentmanager.py:0: error: Value of type "dict[str, Any] | None" is not indexable [index]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decoding of api results, difficult to avoid

Comment on lines +92 to +99
self._db_monitor = DatabaseMonitor(
data.Resource._connection_pool,
opt.db_name.get(),
opt.db_host.get(),
"scheduler",
str(self.environment),
)
self._db_monitor.start()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scheduler gets DB monitor as well

Comment on lines +343 to +354
async def get_db_status(self) -> DataBaseReport:
if self._db_monitor is None:
return DataBaseReport(
connected=False,
database="",
host="",
max_pool=0,
open_connections=0,
free_connections=0,
pool_exhaustion_count=0,
)
return await self._db_monitor.get_status()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new status endpoint on the scheduler to get database status

Comment on lines +161 to +162
metrics_reporter = MetricsService()
metrics_reporter.start_metric_reporters()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scheduler will now also report to influxdb

@@ -89,7 +89,7 @@ class SliceStatus(BaseModel):
"""

name: str
status: dict[str, ArgumentTypes]
status: Mapping[str, ArgumentTypes | Mapping[str, ArgumentTypes]]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks the FE, but allow nicer presentation

@@ -19,7 +19,7 @@
# flake8: noqa: F401

SLICE_SERVER = "core.server"
SLICE_AGENT_MANAGER = "core.agentmanager"
SLICE_AGENT_MANAGER = "core.scheduler_manager"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename for FE, hopefully OK

@@ -168,12 +169,63 @@ def __init__(self, closesessionsonstart: bool = True, fact_back_off: Optional[in

self.closesessionsonstart: bool = closesessionsonstart

async def get_status(self) -> dict[str, ArgumentTypes]:
return {
async def get_status(self) -> Mapping[str, ArgumentTypes | Mapping[str, ArgumentTypes]]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is very tricky, as it has to return withing the timeout of the status endpoint.

I chose to make it report what it has, even if that makes the format somewhat inconsistent.
(e.g. if the DB is down, the environments will be reported by UUID instead of name)

from inmanta.server import agentmanager


class DatabaseMonitor:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Factored out!


def start_monitor(self) -> None:
"""Attach to monitoring system"""
suffix = f",component={self.component}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nasty way of adding tags, but it works

Comment on lines 104 to 107
gauge(
"db.pool_exhaustion_count" + suffix,
CallbackGauge(callback=lambda: self._db_pool_watcher._exhausted_pool_events_count),
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a new metric

if other.host != self.host:
return NotImplemented
return DataBaseReport(
connected=self.connected,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about different connected values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question, I should probably and them

Comment on lines 106 to 108
suffix = f",component={self.component}"
if self.environment:
suffix += f",environment={self.environment}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suffix could be factored out, e.g. into a property, since it's also used in start_monitor.

tests/test_influxdbreporting.py Outdated Show resolved Hide resolved
stubs/pyformance/__init__.pyi Show resolved Hide resolved
src/inmanta/server/agentmanager.py Outdated Show resolved Hide resolved
@@ -74,44 +81,43 @@ async def get_status(self) -> DataBaseReport:
pool_exhaustion_count=self._db_pool_watcher._exhausted_pool_events_count,
)

def _add_guage(self, name: str, the_gauge: CallbackGauge) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _add_guage(self, name: str, the_gauge: CallbackGauge) -> None:
def _add_gauge(self, name: str, the_gauge: CallbackGauge) -> None:

if self.environment:
self.infuxdb_suffix += f",environment={self.environment}"

self.registered_guages: list[str] = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.registered_guages: list[str] = []
self.registered_gauges: list[str] = []

src/inmanta/server/agentmanager.py Show resolved Hide resolved
Copy link
Contributor

@sanderr sanderr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm only vaguelly familiar with all of this tbh.

change-type: minor
destination-branches: [master]
sections:
minor-improvement: "{{description}}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this deserve a change entry, considering that the new scheduler has not been publicly released? Or has it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a user visible change, but perhaps I should add more detail?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the entire scheduler is new, so I would assume that to be one change entry rather than a separate one for each prt it affects.

src/inmanta/server/agentmanager.py Show resolved Hide resolved
src/inmanta/server/agentmanager.py Show resolved Hide resolved
@@ -392,6 +392,25 @@ def action_function() -> None:
self._scheduled[task_spec] = handle
return task_spec

def schedule(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this method offer? I do prefer the name, but I feel like it's just a layer on top of add_action that makes a tiny streamline of the interface for one type of schedules, at the cost of dropping support for the other. So now we need to call either schedule() or add_action() depending on the use case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is indeed purely convenience. The ServerSlice has a similar method, with similar usecase.

I can remove it if you don't think it should be here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it seems like it adds more confusion than anything else. If I see a method schedule I'd assume that to be the main interface, but it isn't because it only supports one of the two schedule types.

So I prefer it without, but if you feel it's useful I won't stand in the way.

@@ -399,7 +390,7 @@ def get_extension_statuses(cls, slices: list["ServerSlice"]) -> list[ExtensionSt
result[ext_status.name] = ext_status
return list(result.values())

async def get_status(self) -> dict[str, ArgumentTypes]:
async def get_status(self) -> Mapping[str, ArgumentTypes | Mapping[str, ArgumentTypes]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change all these from dict to Mapping?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to invariance of dict, the Union is a bit hard to use otherwise and it is inteded to be mapping all along

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But Mapping is compatible with dict, so you can just return whatever you're actually putting in there, no?

I'm not necessarily against it though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update pool metrics collector to also take into account the pool of the schedulers
3 participants